I this notebook we ingest and visualize the mobility trends data provided by Apple, [APPL1].
We take the following steps:
Download the data
Import the data and summarise it
Transform the data into long form
Partition the data into subsets that correspond to combinations of geographical regions and transportation types
Make contingency matrices and corresponding heat-map plots
Make nearest neighbors graphs over the contingency matrices and plot communities
Plot the corresponding time series
About This Data The CSV file and charts on this site show a relative volume of directions requests per country/region or city compared to a baseline volume on January 13th, 2020. We define our day as midnight-to-midnight, Pacific time. Cities represent usage in greater metropolitan areas and are stably defined during this period. In many countries/regions and cities, relative volume has increased since January 13th, consistent with normal, seasonal usage of Apple Maps. Day of week effects are important to normalize as you use this data. Data that is sent from users’ devices to the Maps service is associated with random, rotating identifiers so Apple doesn’t have a profile of your movements and searches. Apple Maps has no demographic information about our users, so we can’t make any statements about the representativeness of our usage against the overall population.
The observations listed in this subsection are also placed under the relevant statistics in the following sections and indicated with “Observation”.
The directions requests volumes reference date for normalization is 2020-01-13 : all the values in that column are \(100\).
From the community clusters of the nearest neighbor graphs (derived from the time series of the normalized driving directions requests volume) we see that countries and cities are clustered in expected ways. For example, in the community graph plot corresponding to “{city, driving}” the cities Oslo, Copenhagen, Helsinki, Stockholm, and Zurich are placed in the same cluster. In the graphs corresponding to “{city, transit}” and “{city, walking}” the Japanese cities Tokyo, Osaka, Nagoya, and Fukuoka are clustered together.
In the time series plots the Sundays are indicated with orange dashed lines. We can see that from Monday to Thursday people are more familiar with their trips than say on Fridays and Saturdays. We can also see that on Sundays people (on average) are more familiar with their trips or simply travel less.
library(Matrix)
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.3.3 ✓ purrr 0.3.4
✓ tibble 3.0.6 ✓ dplyr 1.0.4
✓ tidyr 1.1.2 ✓ stringr 1.4.0
✓ readr 1.4.0 ✓ forcats 0.5.1
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x tidyr::expand() masks Matrix::expand()
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
x tidyr::pack() masks Matrix::pack()
x tidyr::unpack() masks Matrix::unpack()
library(ggplot2)
library(gridExtra)
Attaching package: ‘gridExtra’
The following object is masked from ‘package:dplyr’:
combine
library(d3heatmap)
library(igraph)
Attaching package: ‘igraph’
The following objects are masked from ‘package:dplyr’:
as_data_frame, groups, union
The following objects are masked from ‘package:purrr’:
compose, simplify
The following object is masked from ‘package:tidyr’:
crossing
The following object is masked from ‘package:tibble’:
as_data_frame
The following objects are masked from ‘package:stats’:
decompose, spectrum
The following object is masked from ‘package:base’:
union
library(zoo)
Attaching package: ‘zoo’
The following objects are masked from ‘package:base’:
as.Date, as.Date.numeric
Apple mobile data was provided in this WWW page: https://www.apple.com/covid19/mobility , [APPL1]. (The data has to be download from that web page – there is an “agreement to terms”, etc.)
dfAppleMobility <- read.csv( "~/Downloads/applemobilitytrends-2021-01-15.csv", stringsAsFactors = FALSE)
#dfAppleMobility <- read.csv("https://covid19-static.cdn-apple.com/covid19-mobility-data/2024HotfixDev18/v3/en-us/applemobilitytrends-2021-01-15.csv")
names(dfAppleMobility) <- gsub( "^X", "", names(dfAppleMobility))
names(dfAppleMobility) <- gsub( ".", "-", names(dfAppleMobility), fixed = TRUE)
dfAppleMobility
Observation: The directions requests volumes reference date for normalization is 2020-01-13 : all the values in that column are \(100\).
Data dimensions:
dim(dfAppleMobility)
[1] 4691 375
Data summary:
summary(as.data.frame(unclass(dfAppleMobility[,1:3]), stringsAsFactors = TRUE))
geo_type region transportation_type
city : 790 Washington County: 27 driving:3048
country/region: 153 Jefferson County : 25 transit: 551
county :2638 Montgomery County: 24 walking:1092
sub-region :1110 Franklin County : 22
Madison County : 21
Jackson County : 19
(Other) :4553
Number of unique “country/region” values:
dfAppleMobility %>%
dplyr::filter( geo_type == "country/region") %>%
dplyr::pull("region") %>%
unique %>%
length
[1] 63
Number of unique “city” values:
dfAppleMobility %>%
dplyr::filter( geo_type == "city") %>%
dplyr::pull("region") %>%
unique %>%
length
[1] 295
All unique geo types:
lsGeoTypes <- unique(dfAppleMobility[["geo_type"]])
lsGeoTypes
[1] "country/region" "city" "sub-region" "county"
All unique transportation types:
lsTransportationTypes <- unique(dfAppleMobility[["transportation_type"]])
lsTransportationTypes
[1] "driving" "walking" "transit"
It is better to have the data in long form (narrow form). For that I am using the package “tidyr”.
# lsIDColumnNames <- c("geo_type", "region", "transportation_type") # For the initial dataset released by Apple.
lsIDColumnNames <- c("geo_type", "region", "transportation_type", "alternative_name", "sub-region", "country" )
dfAppleMobilityLongForm <- tidyr::pivot_longer( data = dfAppleMobility, cols = setdiff( names(dfAppleMobility), lsIDColumnNames), names_to = "Date", values_to = "Value" )
dim(dfAppleMobilityLongForm)
[1] 1730979 8
Remove the rows with “empty” values:
dfAppleMobilityLongForm <- dfAppleMobilityLongForm[ complete.cases(dfAppleMobilityLongForm), ]
dim(dfAppleMobilityLongForm)
[1] 1709416 8
Add the “DateObject” column:
dfAppleMobilityLongForm$DateObject <- as.POSIXct( dfAppleMobilityLongForm$Date, format = "%Y-%m-%d", origin = "1970-01-01" )
Add “day name” (“day of the week”) field:
dfAppleMobilityLongForm$DayName <- weekdays(dfAppleMobilityLongForm$DateObject)
Here is sample of the transformed data:
set.seed(3232)
dfAppleMobilityLongForm %>% dplyr::sample_n( 10 )
Here is summary:
summary(as.data.frame(unclass(dfAppleMobilityLongForm), stringsAsFactors = TRUE))
geo_type region transportation_type alternative_name sub.region country Date Value DateObject DayName
city :289938 Washington County: 9919 driving:1104303 :1335188 :486577 United States:1139718 2020-01-13: 4652 Min. : 0.44 Min. :2020-01-13 00:00:00 Friday :246556
country/region: 56151 Jefferson County : 9187 transit: 202875 AB : 1105 Texas : 88523 Japan : 81295 2020-01-14: 4652 1st Qu.: 84.26 1st Qu.:2020-04-13 00:00:00 Monday :242970
county :969242 Montgomery County: 8826 walking: 402238 ACT : 1105 California: 61030 : 56151 2020-01-15: 4652 Median : 113.54 Median :2020-07-16 00:00:00 Saturday :241904
sub-region :394085 Franklin County : 8078 Andalucía : 1105 Georgia : 48119 France : 33098 2020-01-16: 4652 Mean : 121.59 Mean :2020-07-15 06:51:04 Sunday :241904
Madison County : 7713 Bayern : 1105 Virginia : 45183 Germany : 31608 2020-01-17: 4652 3rd Qu.: 148.41 3rd Qu.:2020-10-16 00:00:00 Thursday :246556
Jackson County : 6979 BC|Colombie-Britannique: 1105 Florida : 44493 Thailand : 24968 2020-01-18: 4652 Max. :2148.12 Max. :2021-01-15 00:00:00 Tuesday :242970
(Other) :1658714 (Other) : 368703 (Other) :935491 (Other) : 342578 (Other) :1681504 Wednesday:246556
Partition the data into geo types × transportation types:
dfAppleMobilityLongForm %>%
dplyr::group_by( geo_type, transportation_type) %>%
dplyr::count()
aQueries <- split(dfAppleMobilityLongForm, dfAppleMobilityLongForm[,c("geo_type", "transportation_type")] )
We can visualize the data using heat-map plots.
Remark: Using the contingency matrices prepared for the heat-map plots we can do further analysis, like, finding correlations or nearest neighbors. (See below.)
Cross-tabulate dates with regions:
aMatDateRegion <- purrr::map( aQueries, function(dfX) { xtabs( formula = Value ~ Date + region, data = dfX, sparse = TRUE ) } )
aMatDateRegion <- aMatDateRegion[ purrr::map_lgl(aMatDateRegion, function(x) nrow(x) > 0 ) ]
dfPlotQuery <- purrr::map_df( aMatDateRegion, Matrix::summary, .id = "Type" )
head(dfPlotQuery)
367 x 295 sparse Matrix of class "dgCMatrix", with 108265 entries
Type i j x
1 city.driving 1 1 100.00
2 city.driving 2 1 100.73
3 city.driving 3 1 102.86
4 city.driving 4 1 102.65
5 city.driving 5 1 109.39
6 city.driving 6 1 109.62
ggplot2::ggplot(dfPlotQuery) +
ggplot2::geom_tile( ggplot2::aes( x = j, y = i, fill = log10(x)), color = "white") +
ggplot2::scale_fill_gradient(low = "white", high = "blue") +
ggplot2::xlab("Region") + ggplot2::ylab("Date") +
ggplot2::facet_wrap( ~Type, scales = "free", ncol = 2)
Here we take a “closer look” to one of the plots using a dedicated d3heatmap plot:
d3heatmap::d3heatmap( x = aMatDateRegion[["country/region.driving"]], Rowv = FALSE )
Here we create nearest neighbor graphs of the contingency matrices computed above and plot cluster the nodes:
th <- 0.94
aNNGraphs <-
purrr::map( aMatDateRegion, function(m) {
m2 <- cor(as.matrix(m))
for( i in 1:nrow(m2) ) {
m2[i,i] <- 0
}
m2 <- as( m2, "dgCMatrix")
m2@x[ m2@x <= th ] <- 0
#m2@x[ m2@x > th ] <- 1
igraph::graph_from_adjacency_matrix(Matrix::drop0(m2), weighted = TRUE, mode = "undirected")
})
ind <- 3
ceb <- cluster_edge_betweenness(aNNGraphs[[ind]])
dendPlot(ceb, mode="hclust", main = names(aNNGraphs)[[ind]])
plot(ceb, aNNGraphs[[ind]], vertex.size=1, vertex.label=NA, main = names(aNNGraphs)[[ind]])
In this section for each date we sum all cases over the region-transportation pairs, make a time series, and plot them.
Remark: In the plots the Sundays are indicated with orange dashed lines.
Here we make the time series:
aDateStringToDateObject <- unique( dfAppleMobilityLongForm[, c("Date", "DateObject")] )
aDateStringToDateObject <- setNames( aDateStringToDateObject$DateObject, aDateStringToDateObject$Date )
aDateStringToDateObject <- as.POSIXct(aDateStringToDateObject)
aTSDirReqByCountry <- purrr::map( aMatDateRegion, function(m) rowSums(m) )
matTS <- do.call( cbind, aTSDirReqByCountry)
number of rows of result is not a multiple of vector length (arg 1)
zooObj <- zoo::zoo( x = matTS, as.POSIXct(rownames(matTS)) )
Here we plot them:
autoplot(zooObj) +
aes(colour = NULL, linetype = NULL) +
facet_grid(Series ~ ., scales = "free_y") +
geom_vline( xintercept = aDateStringToDateObject[weekdays(aDateStringToDateObject) == "Sunday"], color = "orange", linetype = "dashed", size = 0.3 )
Observation: In the time series plots the Sundays are indicated with orange dashed lines. We can see that from Monday to Thursday people are more familiar with their trips than say on Fridays and Saturdays. We can also see that on Sundays people (on average) are more familiar with their trips or simply travel less.
He we do “forecast” for code-workflow demonstration purposes – the forecasts should not be taken seriously.
Fit a time series model to the time series:
aTSModels <- purrr::map( names(zooObj), function(x) { forecast::auto.arima( zoo( x = zooObj[,x], order.by = index(zooObj) ) ) } )
aTSModels <- purrr::map( names(zooObj), function(x) forecast::forecast( as.matrix(zooObj)[,x] ) )
names(aTSModels) <- names(zooObj)
Plot data and forecast:
lsPlots <- purrr::map( names(aTSModels), function(x) autoplot(aTSModels[[x]]) + ylab("Volume") + ggtitle(x) )
names(lsPlots) <- names(aTSModels)
do.call( gridExtra::grid.arrange, lsPlots )
[APPL1] Apple Inc., Mobility Trends Reports, (2020), apple.com.
[AA1] Anton Antonov, “Apple mobility trends data visualization”, (2020), SystemModeling at GitHub.
[AA2] Anton Antonov, “NY Times COVID-19 data visualization”, (2020), SystemModeling at GitHub.